Looking at Vehicle thefts from 2003-2024 in

San Francisco Police department crime data

Since the year 2003 the police department of san francisco has been reporting crime data. All these data are available to the public and therefore relevant to do data analysis on. Of particular interest for data analysis is the different crime types, the time of incident (both date but also time of day down to the minute) but also the coordinates of the incident (given in latitude/longitude). From this data its possible to look at temporal and spatial trends of different crimes over the last 20+ years. The different categories of crimes include vehicle theft, vandalism, robbery, prostitution and many more.

We have chosen to look at the trends of vehicle thefts since the trend are “unique” compared to the other types of crimes. This will be shown in the following plots.

The temporal trend of vehicle thefts

The first relevant plot is the number of incidents of vehicle theft per year. We ignore 2025 since the we do not have data for that entire year.

Code
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
data=pd.read_csv("C:/NoterDTU/6_semester/Social_data/website_2/s224394.github.io/merged_data.csv")
crimes = data[['Category', 'Year']]
crimes = crimes[(crimes['Category']=='VEHICLE THEFT') & (crimes['Year']!=2025)  ]
crime_counts = crimes["Year"].value_counts().sort_index()
plt.figure().set_figwidth(7.4)
crime_counts.plot(kind="bar",color="indigo",edgecolor="black")
ax=plt.gca()

ax.set_facecolor("#f5f5f5")
plt.ylabel("Number of incidents")
plt.xlabel("Year")
plt.title("Number of Vehicle thefts per year (2003-2024)")
plt.show()
Figure 1: The plot shows the number of reported incidents of “Vehicle Theft” by the SFPD for every year 2003-2024. There is a 60% percent drop from 2005 to 2006 which might suggest that somethinge was actively done in order to reduce the number of vehicle thefts.

One thing that seems unique is the sudden drop from 2005 to 2006 and onwards. In 2005 the numbers peak at around 17.500 vehicle thefts while the next year it drops by around 10.000 and remains in that range going forward. This is approximately 60% of the crimes that just stopped happening in one year. One would suspect that the police force might have increased the resources for fighting vehicle theft. Another explanation might be that cars are harder to steal today because of increased security measures on the cars.

In the news article “Car Thefts Decrease Statewide” by east bay times from 2007 (Staff 2007) talks aboutWhere the general trend for vehicle theft are on the decline. The reason behind this trend is both the fact that more and more vehicle have implemented alarms, key-coding systems. But also there has also been set up 16 auto-theft task forces. There ways also an increase in the use of so called “bait-cars”. The way “bait-cars” work is by including a hidden gps in the car and making it intentionally easy to steal. The cars are then used to track down the drivers. Since the car thieves often are responsible for stealing more than one car. Catching one criminal might decrease the number of cars being stolen significantly. In 2006 they made 357 arrest with the use of bait-cars. Which might have severely impacted the amount of cars stolen.

We also see an upward trend in vehicle thefts after post-2010 and post-2020. One theory as to why this is the case is related to the two economic crisis. The 2008 financial crisis and the beginning of the covid-19 pandemic in 2020 might have pushed more people into poverty. Which in turn then makes it so more people commits crime in order to survive.

Correlation between crimes

In order to determine how unique the trend of vehicle theft are it makes sense to compare the data to the other crimes reported. We choose to look at the the number of incidents per month for a given crime. Afterwards we see how correlated the data for two crimes are. The correlation coefficient for the given plot are also calculated as to give a statistical measure of how correlated the data are. In the plot its possible to choose which crime categories to compare. If you hover over one of the points you get the month of the data and also the number of incidents.

Code
from bokeh.io import output_notebook, show
from bokeh.layouts import column, row
from bokeh.models import Select, Slope, Label, CustomJS, HoverTool
from bokeh.plotting import figure, ColumnDataSource

# Configure Bokeh to load silently
output_notebook(hide_banner=True)

# Define focus crimes
focuscrimes = {
    'WEAPON LAWS', 'PROSTITUTION', 'ROBBERY', 'BURGLARY', 'ASSAULT', 
    'DRUG/NARCOTIC', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY'
}


# Load data
df = pd.read_csv("C:/NoterDTU/6_semester/Social_data/website_2/s224394.github.io/merged_data.csv")

# Filter and process data
df_focus = df[df['Category'].isin(focuscrimes)]
df_focus_grouped = df_focus.groupby(['Year', 'Month', 'Category']).size().reset_index(name='Crime_Count')
df_focus_grouped['Date'] = pd.to_datetime(df_focus_grouped['Month'] + ' ' + df_focus_grouped['Year'].astype(str), errors='coerce')
df_focus_grouped = df_focus_grouped.dropna()

# Extract month and year for hover tool
df_focus_grouped['Month_Year'] = df_focus_grouped['Date'].dt.strftime('%b %Y')

# Pivot the data
df_pivot = df_focus_grouped.pivot_table(index=['Date', 'Month_Year'], columns='Category', values='Crime_Count', fill_value=0)
df_pivot['Total Crimes'] = df_pivot.sum(axis=1)
df_pivot.reset_index(inplace=True)

# Prepare plotting data
numeric_cols = [col for col in df_pivot.columns if col not in ['Date', 'Month_Year']]
df_plot = df_pivot[numeric_cols]

# Set initial variables
x_init = numeric_cols[8]
y_init = numeric_cols[1]
x_data = df_plot[x_init].values
y_data = df_plot[y_init].values

# Calculate initial regression
n = len(x_data)
x_sum, y_sum, xy_sum, x2_sum, y2_sum = x_data.sum(), y_data.sum(), (x_data*y_data).sum(), (x_data**2).sum(), (y_data**2).sum()
slope_val = (n * xy_sum - x_sum * y_sum) / (n * x2_sum - x_sum * x_sum)
intercept = (y_sum - slope_val * x_sum) / n
r_value = (n * xy_sum - x_sum * y_sum) / np.sqrt((n * x2_sum - x_sum * x_sum) * (n * y2_sum - y_sum * y_sum))
r_squared = r_value ** 2

# Create ColumnDataSource with Month_Year for hover tool
source = ColumnDataSource(df_pivot)

# Create figure with initial axis labels
plot = figure(
    title="Crime Data Correlation Analysis", 
    x_axis_label="Number of incidents for X-axis crime type (month,year)",
    y_axis_label="Number of incidents for Y-axis crime type (month,year)",
    tools="pan,wheel_zoom,box_zoom,reset",
    width=750, 
    height=550,
    background_fill_color="#f5f5f5",
    toolbar_location="above"
)

# Format plot appearance
plot.title.text_font_size = '16pt'
plot.xaxis.axis_label_text_font_size = "12pt"
plot.yaxis.axis_label_text_font_size = "12pt"
plot.grid.grid_line_alpha = 0.3

# Add only the month-year hover tool
hover = HoverTool(
    tooltips=[
        ("Time Period", "@Month_Year"),
        (x_init, f"@{{{x_init}}}"),
        (y_init, f"@{{{y_init}}}"),
        ("Total Crimes", "@{Total Crimes}")
    ],
    mode='mouse'
)
plot.add_tools(hover)

# Initial scatter plot
scatter = plot.scatter(x=x_init, y=y_init, source=source, size=10,
                      color="indigo", alpha=0.7, line_color="white")

# Dropdown widgets
x_axis = Select(title="X-Axis Crime Type:", value=x_init,
               options=sorted(numeric_cols), width=250)
y_axis = Select(title="Y-Axis Crime Type:", value=y_init,
               options=sorted(numeric_cols), width=250)

# Regression line
slope = Slope(gradient=slope_val, y_intercept=intercept, 
             line_color='red', line_dash='dashed', line_width=2.5)
plot.add_layout(slope)

# R² label
r_squared_label = Label(x=70, y=10, x_units='screen', y_units='screen',
                       text=f"R² = {r_squared:.3f}", text_font_size='13px',
                       text_color='red', background_fill_color='white',
                       background_fill_alpha=0.8)
plot.add_layout(r_squared_label)

# JavaScript callback with axis label updates
callback = CustomJS(args=dict(
    source=source,
    scatter=scatter,
    slope=slope,
    r_squared_label=r_squared_label,
    plot=plot,
    x_axis=x_axis,
    y_axis=y_axis
), code="""
    const x = x_axis.value;
    const y = y_axis.value;
    const x_data = source.data[x];
    const y_data = source.data[y];
    
    // Calculate statistics
    let x_sum = 0, y_sum = 0, xy_sum = 0, x2_sum = 0, y2_sum = 0;
    const n = x_data.length;
    
    for (let i = 0; i < n; i++) {
        x_sum += x_data[i];
        y_sum += y_data[i];
        xy_sum += x_data[i] * y_data[i];
        x2_sum += x_data[i] * x_data[i];
        y2_sum += y_data[i] * y_data[i];
    }
    
    // Calculate regression parameters
    const slope_val = (n * xy_sum - x_sum * y_sum) / (n * x2_sum - x_sum * x_sum);
    const intercept = (y_sum - slope_val * x_sum) / n;
    const r_value = (n * xy_sum - x_sum * y_sum) / 
                   Math.sqrt((n * x2_sum - x_sum * x_sum) * (n * y2_sum - y_sum * y_sum));
    const r_squared = r_value * r_value;
    
    // Update plot elements
    scatter.glyph.x = {field: x};
    scatter.glyph.y = {field: y};
    slope.gradient = slope_val;
    slope.y_intercept = intercept;
    r_squared_label.text = `R² = ${r_squared.toFixed(3)}`;
    
    // Update axis labels
    plot.xaxis.axis_label = `${x} (Count)`;
    plot.yaxis.axis_label = `${y} (Count)`;
""")

# Connect callbacks
x_axis.js_on_change('value', callback)
y_axis.js_on_change('value', callback)

# Layout
layout = column(
    row(x_axis, y_axis, spacing=20),
    plot
)

# Show the plot
show(layout)
Figure 3: This plot number of incidents for two choosen crimes for a given month and year as a scatterplot. The correlation is calculated and there has been calculated a linear fit. The correlation coefficient between vehicle theft and any other crimes close to zero in most cases. This is not always the case when comparing other types of crimes.

Comparing vehicle theft to the other crimes one sees that there is close to zero correlation between any of the crimes and vehicle theft. The reason behind this is probably the incidents of the years 2003-2005. In most of the plots comparing vehicle data the data from 2003-2005 is completely separate from the rest of the data. This suggest that the sudden drop of vehicle thefts are unique compared all the other crimes. The plots suggest that if we only only looked at the data post-2006 then the correlation would be much greater. One theory as to why this is the case is because of the of focus an battling vehicle theft in 2006. Which lead to a 60% drop in crimes. Where as other crimes such as burglary and vandalism where not focused on. In order to fully determine if the behavior of vehicle theft are unique we also looked at the correlation between some of the other crimes. An example could be robbery and assault which are way more correlated (\(r^2\) value of 0.485) and generally if we compare most of the crimes with the total number of crimes, then they are fairly correlated. Examples being vandalism having \(r^2=0.486\) and larceny/theft having \(r^2=0.700\). Some of this might also be explained in larceny/theft and vandalism playing a bigger part of the crime incidents numbers. Not all types of crimes peak in the same months. But this ones again suggest that vehicle massive drop where unique.

Conclusion

We have in this article looked at three different plots. Where in particularly the temporal and corelation plots suggest that the trend of vehicle thefts incidents are unique compared to the temporal trends of other crimes. This is concluded both because the number of vehicle thefts drop by approximately 60% in one year. But also because the corelation between number of vehicle thefts compared to the incident number are generally close to zero. A news article from 2007 explains that the san francisco police department set up 16 auto-theft task forces and generally caught a lot of car thieves in 2006. This might explain why there was a sudden drop in vehicle theft incidents. This also might explain why the different crimes are not as much correlated with vehicle theft as they are with each other.

References

Staff, East Bay Times. 2007. “Car Thefts Decrease Statewide.” East Bay Times. https://www.eastbaytimes.com/2007/02/16/car-thefts-decrease-statewide/.